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Abstract 

Pneumocystis jirovecii is a fungal parasite that colonizes specifically humans and turns into an opportunistic pathogen in immuno- 
deficient individuals. The fungus is able to reproduce extracellularly in host lungs without eliciting massive cellular death. The mo- 
lecular mechanisms that govern this process are poorly understood, in part because of the lack of an in vitro culture system for 
Pneumocystis spp. In this study, we explored the origin and evolution of the putative biotrophy of P. jirovecii through comparative 
genomics and reconstruction of ancestral gene repertoires. We used the maximum parsimony method and genomes of related fungi 
of the Taphrinomycotina subphylum. Our results suggest that the last common ancestor of Pneumocystis spp. lost 2,324 genes in 
relation to the acquisition of obligate biotrophy. These losses may result from neutral drift and affect the biosyntheses of amino acids 
and thiamine, the assimilation of inorganic nitrogen and sulfur, and the catabolism of purines. In addition, P. jirovecii shows a reduced 
panel of lytic proteases and has lost the RNA interference machinery, which might contribute to its genome plasticity. Together with 
other characteristics, that is, a sex life cycle within the host, the absence of massive destruction of host cells, difficult culturing, and the 
lack of virulence factors, these gene losses constitute a unique combination of characteristics which are hallmarks of both obligate 
biotrophs and animal parasites. These findings suggest that Pneumocystis spp. should be considered as the first described obligate 
biotrophs of animals, whose evolution has been marked by gene losses. 
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Introduction 

Pneumocystis jirovecii is a fungal parasite causing 
Pneumocystis pneumonia in human beings with deficient 
immune system, a disease which is often fatal if untreated. 
The Pneumocystis genus includes several species, each display- 
ing strict host specificity for a given mammalian species, 
for example, P. jirovecii infects humans, P. carinii rats, and 
P. murina mice. These fungi are members of the 
Taphrinomycotina, a subphylum of Ascomycota (Redhead 
et al. 2006), which encompasses organisms with remarkably 
diverse lifestyles ranging from free-living saprophytes (e.g., 
Schizosaccharomyces spp.) to biotrophic plant pathogens 
(e.g., Taphrina spp.). 



Two categories of fungal parasitism are recognized: 
Necrotrophy where food is obtained from killed host cells, 
and biotrophy where food is obtained from living host cells 
(Kemen and Jones 2012). In general, necrotrophs secrete 
many cell wall degrading enzymes as well as toxins to kill 
host cells. On the other hand, biotrophs secrete low amounts 
of lytic enzymes and cause little damage to their host. 
Biotrophic lifestyle includes a continuum from facultative to 
obligate states (Kemen and Jones 2012; Spanu 2012). 
Facultative biotrophs require a living host only for a specific 
stage of their life cycle and can usually be easily grown 
axenically in the laboratory. Obligate biotrophs present an ab- 
solute requirement for their host to survive and usually cannot 
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be grown axenically in the laboratory. Evolution toward obli- 
gate biotrophy is generally associated with losses of pathways, 
whose end products can be scavenged from the host (Spanu 
et al. 2010; Duplessis et al. 201 1). In the case of human par- 
asites, true pathogens refer to those able to cause disease in 
healthy individuals, whereas opportunistic pathogens are 
those causing disease only in the case of immunosuppression. 
These definitions are used in this work. 

Pneumocystis spp. dwell almost exclusively in mammalian 
lungs where they apparently do not trigger massive host cell 
death. They carry out their sex life cycle within their host, 
which is a hallmark of obligate parasitism and which has 
been observed among fungi only in plant biotrophs (Spanu 
2012). Moreover, attempts to obtain axenic in vitro culture 
have failed so far. Based on these characteristics, a parallel 
between Pneumocystis spp. and plant obligate biotrophs has 
been suggested (Cushion and Stringer 2010). Subsequent 
findings reinforced this resemblance. Indeed, Pneumocystis 
spp. have been suggested to be obligate parasites because 
of their relatively small genome with a high Adenine and 
Thymine (AT) content and the loss of the capacity of synthe- 
sizing most amino acids (Hauser et al. 201 0; Cisse et al. 201 2). 
Moreover, Pneumocystis spp. lack virulence factors commonly 
found in pathogenic fungi, such as glyoxylate cycle, toxins, 
and secondary metabolism (Cushion et al. 2007; Cisse et al. 
2012). 

The origin and evolution of biotrophy in Pneumocystis spp. 
can be explored through the hypothetical reconstruction of 
ancestral gene repertoires. This reconstruction is based on the 
reliable identification of homologous gene families across re- 
lated organisms. There are currently two methods for that 
purpose: The maximum parsimony (MP) and the maximum 
likelihood (ML) (Wolf and Koonin 2013). MP is designed to 
find the ancestral state that corresponds to the minimum 
number of character changes necessary to match the species 
phylogeny with the observed presence-absence patterns. In 
MP, different models of evolution can be applied, for example, 
the irreversible Dollo model (Farris 1977), which assumes that 
once an ancestral character is lost during the evolution of a 
particular organism, it cannot be regained. The MP method is 
based on a stochastic birth-and-death model of evolution. This 
model assumes that a new gene can appear only once in a 
phylogeny and that it can further evolve through duplication 
and loss in the descendant lineages. ML is based on the like- 
lihood that ancestral states would evolve under a stochastic 
model of evolution to reproduce the observed presence- 
absence patterns (Cohen et al. 2008; Csuros 2010). 
Although ML has been suggested to be more robust than 
MP (Wolf and Koonin 2013), this method requires large 
amounts of data and is generally not applicable to undersam- 
pled clades. 

In this study, we have determined ubiquitous and species- 
specific gene families in the Taphrinomycotina subphylum. 
We then applied a parsimonious scenario to reconstruct 



ancestry of several lineages, and inferred gene families' 
losses and gains. Our results suggest that the acquisition of 
biotrophy in Pneumocystis spp. was marked by gene losses in 
biosynthetic pathways, whose products are likely to be scav- 
enged from the lung environment. 

These findings provide insights into the intimate relation of 
these microorganisms with their mammalian hosts. 

Materials and Methods 

Data Sources and Proteome Completion 

Incomplete versions of the proteomes of P. jirovecii and 
T. deformans are present in UniProt (www.uniprot.org, last 
accessed November 11, 2013). Therefore, we completed the 
protein data sets with manually curated gene predictions 
(Cisse et al. 2012, 2013). These data sets as well as P. carinii 
predicted proteins and expressed sequence tags are available 
at http://www.chuv.ch/imul/imu_home/imu_recherche/imu_ 
recherche_hauser/imu-phauser-suppldata.htm (last accessed 
July 27, 2014). Pneumocystis carinii genomic data sets and 
predicted proteins correspond to data published by others 
(Cushion et al. 2007; Hauser et al. 2010; Cisse et al. 2012; 
Slaven et al. 2006). Pneumocystis jirovecii genome raw reads 
and transcriptome assembly, as well as T. deformans ex- 
pressed sequence tags (ESTs) were downloaded from EBI 
(http://www.ebi.ac.uk, last accessed November 11, 2013). 
The transcriptome assemblies and ESTs were translated into 
the six open reading frames. The predicted proteomes of 
Saccharomyces cerevisiae, Ustilago maydis, Neurospora 
crassa, Yarrowia lipolytica, Ashbya gossypii, Rhizopus oryzae, 
Aspergillus fumigatus, and Debaryomyces hansenii were ob- 
tained from UniProt (www.uniprot.org, last accessed August 
26, 2013). The proteomes of 5. pombe, S. octosporus, S. 
japonicus, and 5. cryptosporus were from the Broad Institute 
(http://www.broadinstitute.org, last accessed November 5, 

2013) , and correspond to data published by Rhind et al. 
(201 1). The level of completion of each protein data set was 
estimated by scanning with 458 core conserved eukaryotic 
proteins from the CEGMA pipeline (Parra et al. 2007) using 
hmmer3 (Eddy 201 1) with an e value of 10 -5 as threshold. 

Gene Families and Annotation 

Gene families were built using OrthoMCL (Li et al. 2003) with 
a Markov inflation of 1.5 and a maximum e value of 10~ 5 . 
Repetitive elements were identified using TransposonPSI 
(http://transposonpsi.sourceforge.net/, last accessed July 27, 

2014) and TBLASTn searches with an e value of 10~ 10 
(Altschul et al. 1997) in Repbase (Jurka et al. 2005). 
Enrichment in Gene Ontology (GO) terms between different 
groups was identified using Fisher's exact test with a P value of 
0.05 and corrected for multiple testing using BLAST2GO 
(Conesa et al. 2005). Enzyme prediction and mapping to 
KEGG biochemical pathways were performed using 
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BLAST2G0 and PRIAM (Claudel-Renard et al. 2003). Proteases 
were classified according to MEROPS database (Rawlings et al. 
2012). 

Ancestral Genome Reconstruction and Gene Gains and 
Losses 

The species phylogeny was obtained from our previous study 
(Cisse et al. 2012) and edited using FigTree (http://tree.bio.ed. 
ac.uk/software/figtree/, last accessed July 27, 2014). The true 
phylogenetic tree along with bootstrap values and branch 
lengths is shown in supplementary figure S3, Supplementary 
Material online. Gene presence and absence patterns were 
extracted from OrthoMCL proteomes clustering using 
custom Perl scripts. This matrix was used to infer ancestral 
genomes using Count (Csuros 2010). Count uses phyloge- 
netic birth-and-death models in a stochastic probability infer- 
ence. The rate models were computed and optimized under 
the gain-loss-duplication model with the Poisson family size 
distribution. The rate of variation across families was set to 
4:1 :1 :4 gamma categories for the edge length, the loss rate, 
gain rate, and the duplication rate, respectively. The conver- 
gence criteria applied were set to 100 rounds for the optimi- 
zation rounds with a likelihood threshold of 0.1. The family 
history was determined under the Dollo parsimony 
assumption. 

Specific Gene Searches 

All searches were first performed in the proteomes. In case of 
absence, searches were systematically carried out on the 
translated transcriptome assemblies, genomes, and finally 
on raw sequences reads if necessary, to confirm the absence. 

RNA Interference Machinery 

The 5. pombe Dicer and Argonaute proteins were used for 
screening using BLASTp (e value < 0.01) and TBLASTn 
(e value <10~ 3 ), respectively. The hidden Markov models- 
based searches were conducted using hmmer3 (0.1) using 
following Pfam models (Punta et al. 2012): PAZ (PFAM no. 
accession PF02170), dicer dimerization (PF03368), helicase 
conserved c-terminal (PF00271), and Reslll (PF04851). The re- 
sults were manually examined to remove false positives. The 5. 
cerevisiae killer virus was used as template for mapping using 
P. y/rovec/V-unassembled genome reads and Newbler's 
gsMapper. Homologs to ScV-A (Gag, major capside protein), 
ScV-M1 (K1 preprotoxin), ScV-M2 (K2 preprotoxin), ScV-M28 
(K28 preprotoxin), and ZbV-M (Zygocin preprotoxin) were 
searched through the P. jirovecii genome and raw reads 
using TBLASTn (e value < 0.1). Sequence accession numbers 
and details are given in Schmitt and Breinig (2006). 



Thiamine Biosynthesis Pathway 

To search the hallmark genes of the thiamine biosynthesis, the 
5. pombe thiamine thiazole synthase (thi2), the thiamine pyr- 
ophosphokinase (tnr3), the phosphomethyl-pyrimidine 
kinase, and the thiamine biosynthetic bifunctional enzyme 
(thi4) were used for screening using BLASTp (e value < 0.01) 
and TBLASTn (e value <10 -3 ). We ran hmmer3 
(e value < 0.1) using the different domains composing the 
S. pombe proteins, that is, phosphomethylpyrimidine kinase 
(PF08543), TEIWTHI-4/PQQC family (PF03070), hydro- 
xyethylthiazole kinase family (PF021 10), thiamine monophos- 
phate synthase (PF02581), and thiamine pyrophosphokinase 
(PF04265 and PF04263). For the thi2 gene, we searched 
for the InterPro domain thiazole biosynthetic enzyme family 
(IPR002922). The results were manually inspected to avoid 
false positives. 

Nitrogen Metabolism 

We used the T. deformans nitrite and nitrate reductases as 
query in order to screen using BLASTp (e value < 0.01) and 
TBLASTn (e value < 10~ 3 ). Domain-based searches were con- 
ducted with hmmer3 (0.1) with the following characteristic 
domains: FAD-binding (PF00970), oxidoreductase NAD- 
binding (PF00175), nitrite/sulfite reductase ferredoxin-like 
half (PF03460), and nitrite and sulphite reductase 4Fe-4S 
(PF01077). We retrieved all proteins containing at least one 
domain and aligned them using MAFTT (Katoh and Standley 
201 3) with E-INS-i option to take into account of the presence 
of multiple domains. As these proteins belong to large protein 
families, the results were manually examined to remove false 
positives (e.g., flavoproteins, sulfite reductases, or cytochrome 
reductases). 

Sulfite Metabolism and Molybdopterin Biosynthesis 

The 5. pombe proteins were used for the screening using 
BLASTp (e value < 0.1) and TBLASTn (e value < 10" 3 ). The 
molybdopterin biosynthesis genes were identified in T. defor- 
mans based on protein similarity with the genes of the obli- 
gate biotroph Albugo laibachii (Kemen et al. 201 1). Domain- 
based searches were conducted using hmmer3 (e value < 0. 1 ) 
with the following characteristic domains: FAD binding 
(PF000667), oxidoreductase NAD-binding (PF00175), Mo-co 
oxidoreductase dimerisation (PF03404), oxidoreductase 
molybdopterin-binding domain (PF00174), Molybdenum 
cofactor synthesis C (PF06463), Radical SAM superfamily 
(PF04055), MoaE (PF02391), molybdopterin binding 
(PF00994), MoeA C-terminal region (PF03454), and MoeA 
N-terminal region (PF03453). 

Transporters 

The transporters were identified using two complementary 
approaches. The proteomes were compared with the tcdb 
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proteins collection (Saier et al. 2009) using ssearch (Pearson 
1991). In parallel, putative amino acid transporters were iden- 
tified using BLAST2GO with BLASTp (10" 1 °) and InterProscan 
options. The sequences were then classified according to the 
tcdb families' classification system to assess putative substrate. 
Statistical tests were performed using (R Development Core 
Team 2008). 

Data Accessibility 

Proteome clustering results and matrix, scripts, and sequences 
alignments are available from GitHub (https://github.com/ 
ocisse/Pneumocystis_comparative, last accessed July 27, 
2014). 

Results 

Identification of Orthologous Gene Families 

To investigate the evolutionary dynamics of gene families 
in the Taphrinomycotina subphylum, we established 
the orthologous relationships among the proteomes 
of seven Taphrinomycotina, four Saccharomycotina, two 
Pezizomycotina, one Ustilaginomycotina, and one 
Mucormycotina. This corresponded to a set of 101,030 pre- 
dicted proteins. Based on the presence of 458 core eukaryotic 
proteins (Parra et al. 2007), P. jirovecii predicted proteins and 
its de novo assembled transcripts showed a completeness of 
89% and 70%, respectively. There was an overlap of 98% 
between these two latter sets, suggesting that most of the 
P. jirovecii genes were captured (supplementary table S1, 
Supplementary Material online). Except that of P. carinii with 
79%, the fungal proteomes had a completeness of at least 
97%. The proteomes clustering yielded 8,368 families and 
2,143 single taxa clusters (i.e., containing at least two proteins 
from the same species), which included 81 % of P. jirovecii 
predicted proteins. 

P. jirovecii Proteome Content 

Sixty-eight percent of the 3,898 P. jirovecii predicted proteins 
had at least one ortholog in the proteome of another fungus, 
including in a distantly related one, suggesting an ancient 
origin (supplementary fig. S1, green and blue slices, 
Supplementary Material online). These conserved genes 
were primarily devoted to the basal cellular and metabolic 
activities (supplementary table S1, Supplementary Material 
online). Four percent of the P. jirovecii proteins were genus 
specific and enriched for glycoprotein biosynthetic activity 
(supplementary fig. S1, red slices, Supplementary Material 
online). As a comparison, the Schizosaccharomyces genus- 
specific genes represented 10% (pink slices), suggesting a 
greater genetic innovation than in the Pneumocystis genus. 
Two percent of P. jirovecii proteins, which encode mostly hy- 
pothetical proteins, were included in gene families specific to 
this organism (light gray slices). 



Gene Families' Gain and Loss Patterns in Reconstructed 
Ancestors 

Biotrophy in Pneumocystis and Taphrina lineages would have 
appeared in ancestors as a consequence of a shift of host or 
environment. Thus, the gene families' losses and gains in these 
ancestors should provide insights regarding the possible ac- 
quisition of biotrophy. To obtain a dynamic view of these 
events, we derived a gene presence-absence matrix from 
the gene clustering and projected these patterns onto a spe- 
cies phylogeny obtained from 458 shared single copy ortho- 
logs (Cisse et al. 2012). We then applied a parsimonious 
evolutionary scenario using the MP method based on the 
Dollo model to infer gene gain and loss events. The recon- 
structed ancestor of Pneumocystis spp. "a4" appeared to 
have had at least 3,1 13 gene families, which indicated a re- 
duction of approximately 30% as compared with the ances- 
tors of T deformans "a5" and of Schizosaccharomyces spp. 
"a3" (fig. 1). There was a significant difference in the number 
of gene families' gains between Taphrinomycotina members 
and their reconstructed ancestors (Wilcoxon rank sum test 
[one-side test], P=0.02), with the highest number in the 
ancestor of Schizosaccharomyces spp. "a3" (n = 529). 
The difference in the number of gene losses among the 
Taphrinomycotina members was also significant (Wilcoxon 
rank sum test, P=0.01), with the highest number in the last 
common ancestor of the Pneumocystis spp. "a4" (n = 1 ,81 9). 
These observations held true when the numbers of gene gains 
and losses were normalized by proteome size (data not 
shown). 

Gene Losses Associated with the Acquisition of Obligate 
Biotrophy in the Pneumocystis Genus 

We focused on two time points in relation to the possible 
acquisition of obligate biotrophy in Pneumocystis ancestry: 
The common ancestor of the Taphrina and Pneumocystis 
genera "a5," and the ancestor of the Pneumocystis genus 
"a4." Notably, T deformans and Pneumocystis spp. display 
distinct behaviors, that is, the former can be grown axenically 
in vitro (Fonseca and Rodrigues 201 1 ), whereas the latter ones 
could not so far. Thus, we hypothesized that gene losses 
during the transition from the "a5" to "a4" occurred. We 
optimized rates for gene birth-and-models and applied Dollo 
parsimony to infer gene families' gain and loss events. The 
investigation of the matrix of these events highlighted 2,324 
ancestral genes that are present in "a5," and subsequently 
lost in "a4." In this analysis, we required that the inferred 
ancestral genes were also conserved in both Taphrina and 
Schizosaccharomyces lineages. These genes were significantly 
enriched for the following GO terms: Oxidation-reduction 
(Fisher's exact test, P=4.5x 10 -11 ), carbohydrate metabo- 
lism (P=2.3 x 10~ 4 ), amino acid transport (P=8.7 x 10" 4 ), 
and sulfur compound metabolic process (P= 0.009) (supple- 
mentary table S3, Supplementary Material online). 
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Fig. 1. — Lifestyle and projected gene families' gain and loss patterns in representative fungal genomes. The gene families' loss and gain patterns are 
projected onto each branch of the species phylogeny. The fungal species phylogeny corresponds to a cladogram derived from the ML tree of our previous 
study (Cisse et al. 201 2; supplementary fig. S3, Supplementary Material online). The numbers on the branch of the phylogenetic tree corresponds the gene 
families' gains (+) and losses (-). The lifestyle of the different species is indicated by the color of the branch (blue for saprotrophs, red for animal pathogens, 
and green for plant pathogens). Biotrophic organisms are indicated by Petri dishes, red crosses indicating the absence of a culture system. Ancestral gene 
families' content was reconstructed using the Dollo parsimony assumption (represented by the gray doted lines and the letter "a" which stands for ancestor). 
For each species, the number of gene families, genes, and orphan genes is displayed. The number of gene families is also displayed for inferred ancestral taxa. 



To investigate whether gene losses could have occurred 
through massive pseudogenization, we conducted a compre- 
hensive search of pseudogenes in P. jirovecii (supplementary 
material, Supplementary Material online; P. carinii could not 
be analyzed because of its high fragmentation). We found 
evidence of 75 pseudogenes (supplementary table S7, 
Supplementary Material online), which correspond to a ratio 
of 0.02 pseudogene per annotated protein-coding gene. 
This value is close to those of 0.03 and 0.01 reported, respec- 
tively, for S. cerevisiae and S. pombe (Echols et al. 2002; Wood 
etal. 2002). 



Loss of Amino Acid and Purine Metabolisms 

Among the 2,324 genes present in ancestor "a5" but lost in 
the Pneumocystis genus, 183 enzymes were mapped, and 
42% of them turned out to be involved in the amino acid 
and purine metabolisms (fig. 2). The overrepresentation of the 
amino acid metabolism category is consistent with the previ- 
ously reported loss of most enzymes dedicated to the biosyn- 
thesis of amino acids (Hauser et al. 2010; Cisse et al. 2012). 
These acquired auxotrophies imply scavenging of these com- 
pounds from the host or lung environment. Many fungi use 
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Fig. 2. — Functional categories of ancestral genes lost in the Pneumocystis genus. The 2,324 ancestral genes that appeared to be present in the last 
common ancestor "a5," and subsequently lost in the Pneumocystis genus, were reconstructed using MP and mapped onto KEGG biochemical pathways 
atlas. 



various amino acid permeases and oligopeptide transporters 
to scavenge amino acids from their environment. For example, 
the biotroph rut fungus Uromyces fabae uses an AAT1p 
amino acid transporter for histidine uptake (Struck et al. 
2002), and the necrotroph fungus Fusarium oxysporum uses 
a general permease for amino acids uptake (Divon et al. 
2005). We screened the proteomes of all species used in 
this study to identify transporters. Pneumocystis jirovecii and 
P. carinii displayed a significant reduction in the amino 
acid/polyamine/organocation family as compared with other 
members of the Taphrinomycotina subphylum (Fisher's exact 
unilateral test, P value = 1.3 x 10~ 7 ; supplementary table S3, 
Supplementary Material online). The major facilitator super- 
family, which are versatile transporters that can carry amino 



acids, also showed a significant reduction in Pneumocystis 
spp. (Fisher's exact unilateral test, Pvalue = 4.08 x 10 -15 ; sup- 
plementary table S3, Supplementary Material online). 

Loss of Purine Degradation Pathway 

Nineteen percent of the 183 ancestral enzymes conserved in 
T. deformans but lost in the Pneumocystis genus are related to 
purine metabolism (fig. 2). This metabolism includes the syn- 
thesis and degradation of adenine and guanine, which are 
fundamental and well-conserved pathways across all life 
forms (Caetano-Anolles et al. 2007). The purine degradation 
allows the recycling of purines and their use as secondary 
source of nitrogen in case of limiting conditions. We de- 
tected the presence of the hallmark genes of the inosine 



Genome Biol. Evol. 6(8): 1938-1 948. doi:10.1093/gbe/evu1 55 Advance Access publication July 24, 2014 



1943 



Cisse et al. 



GBE 



Hypoxanthine 



►Xanthine 



Xanthine deshydrogenase/ 
oxidase 

- 



Uric acid 



Urate oxidase 

X- 



5-hydroxyiourate (HIU) 



HIU hydrolase 



2-oxo-4-hydroxy-4-carboxy- 
5-ureidoimidazoline (OHCU) 



(S)-allantoin 



OHCU decarboxylase 
Allantoinase 

- 



P. jirovecii and P. carinii 
T. deformans 
S. pombe 



Allantoate 



Allantoicase 

X- 



Urea 



Urease 



(S)-ureidoglycolate 

I Ureidoglycolate hydrolase 

X— 

Glyoxylate 



Ammonia 
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and allantoic acid). The pathway has not been experimentally characterized in T. deformans. 



^-phosphate and uridine ^-phosphate biosynthesis pathways 
in P. jirovecii and P. carinii (supplementary table S5, 
Supplementary Material online). These compounds are the 
precursors of the synthesis of purines and pyrimidines, sug- 
gesting that the de novo purine biosynthesis takes place in 
Pneumocystis spp. However, no matches for all the catabolic 
enzymes involved in the degradation of purines were detected 
in P. jirovecii and P. carinii, whereas most of them were present 
in T. deformans and Schizosaccharomyces spp. (fig. 3 and 
supplementary table S4, Supplementary Material online). 

Loss of Nitrogen and Sulfur Assimilation Pathways 

Nitrogen and sulfur are essential for life. The key enzymes of 
the inorganic nitrogen and sulfur metabolisms, that is, the 
nitrate and nitrite reductases, as well as the sulfite reductase 
and oxidase, were found to be absent in the Pneumocystis 
and Schizosaccharomyces genera, while being preserved 
in T. deformans (supplementary table S4, Supplementary 
Material online). The nitrate reductase is a molybdenum- 
dependant enzyme, which is involved in the nitrate reduction 
during the nitrogen acquisition (Campbell 1999). Consistently, 



the complete molybdopterin biosynthesis pathway was lost in 
the Pneumocystis and Schizosaccharomyces genera (supple- 
mentary table S4, Supplementary Material online). 

Loss of Thiamine Biosynthesis Pathway 

Thiamine is an essential cofactor required by almost all living 
organisms. No matches for the key enzymes thiamine thiazole 
synthase and accessory enzyme kinases, except the thiamine 
pyrophosphokinase, were detected in P. jirovecii and P. carinii 
(supplementary table S4, Supplementary Material online). 
These enzymes appeared to be highly conserved in other 
members of the Taphrinomycotina subphylum, except the 
thiamine thiazole synthase in T. deformans. 

Loss of RNAi Machinery 

The RNAi machinery is involved in the genome defense by 
controlling the movement of repetitive mobile elements and 
the regulation of chromatin modification. No matches for the 
Argonaute and Dicer proteins were detected in P. jirovecii 
and P. carinii, whereas they were present in T. deformans 
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and Schizosaccharomyces spp. (supplementary table S4, 
Supplementary Material online). 

Virulence Associated Proteases 

We analyzed several secreted peptidases of the MEROPS sub- 
families, that is, A01 , C 1 3, G01 , M35, S08, S09, S1 0, and S14. 
These are important fungal virulence factors (Monod et al. 
2002), which might be involved in the degradation of the 
extracellular matrix in human lungs. The number of these pro- 
teases in Pneumocystis spp. is slightly lower (0.3-0.4% of their 
proteomes) than in most of the other Taphrinomycotina mem- 
bers (0.4-0.7%) (supplementary table S5, Supplementary 
Material online). Interestingly, P. jirovecii and P. carinii along 
with T. deformans harbor a specific ATP-dependent dp pro- 
tease of the S14 family involved in the hydrolysis of proteins 
(UniProt accession: L0PC06), which is lacking in the 
Schizosaccharomyces spp. On the other hand, Pneumocystis 
spp. showed an underrepresentation of the A01 and S09 
families which have been suggested to be crucial virulence 
factors in many fungi such as the human pathogen Candida 
albicans (Schaller et al. 2005). 

Pneumocystis spp. Genomic Features 

In order to investigate their evolution, several features of 
Pneumocystis spp. genomes were analyzed (supplementary 
material, Supplementary Material online). We estimated 
the genome-wide Ka/Ks ratio by the analysis of 898 single 
copy orthologous genes shared by Pneumocystis and 
Schizosaccharomyces spp. The mean Ka/Ks values of these 
genes were, respectively, 0.06 and 0.05 in the two genera 
(supplementary fig. S2A, Supplementary Material online). 
These values are smaller than one which suggests that these 
genes evolve under purifying selection. 

The proportion of repetitive DNA in P. jirovecii genome, 
including transposons, retrotransposons, and low complexity 
DNA, but excluding the subtelomeric gene families, was esti- 
mated to 9.8% (supplementary table S6, Supplementary 
Material online; P. carinii could not be analyzed because of 
its high fragmentation). This value is higher than those of the 
plant pathogen T. deformans (1.5), of the free-living yeasts 
S. cerevisiae (4.8) and 5. pombe (5.2), and of the extreme 
obligate parasite Encephalitozoon cuniculi (0.7). 

Further analyses revealed intergenic region lengths in 
P. jirovecii which are similar to those observed in the free- 
living yeasts and T. deformans, but larger than those of 
E. cuniculi (supplementary fig. S2B, Supplementary Material 
online). The gene density observed in P. jirovecii (481 genes 
per Mb) was similar to those of 5. cerevisiae (487), 5. pombe 
(554), and T. deformans (431), but clearly lower than that 
observed in E. cuniculi (816; supplementary table S6, 
Supplementary Material online). 



Discussion 

Biotrophy is a character that has emerged independently in 
various fungal clades as well as in oomycetes (Kemen and 
Jones 2012; Spanu 2012), and which has been described ex- 
clusively in plant parasites so far. This study was undertaken to 
explore the signature of biotrophy in the human fungal para- 
site P. jirovecii. By reconstructing the ancestral genomes of the 
Taphrinomycotina subphylum, we discovered several gene 
losses in the Pneumocystis genus compatible with its obligate 
biotrophy. It is unlikely that we missed these genes because of 
the incompleteness of the genomes as 1) the losses were ob- 
served in both P. jirovecii and P. carinii, and 2) the searches 
were conducted on independent and partially overlapping 
genomic and transcriptomic data sets, which are expected 
to cover 100% of the genomes. 

Because a large proportion of the genome was found to be 
under purifying selection, there is a high probability that the 
losses of genes we detected in Pneumocystis spp. have re- 
sulted from neutral genetic drift rather than from positive se- 
lection. Genetic drift might have been favored by the relative 
small effective population size of Pneumocystis spp., which is 
expected from the strict host specificity and absence of free- 
living forms of these fungi. The relatively large proportion of 
repetitive DNA, large intergenic region lengths, and low gene 
density in P. jirovecii are also more compatible with the genetic 
drift hypothesis than with positive selection. Furthermore, the 
ratio of pseudogenes in P. jirovecii was found to be small and 
similar to those of nonbiotrophic free-living organisms, sug- 
gesting that a massive appearance of such nonfunctional 
genes has not been a major mechanism of gene loss. 

Most amino acids biosyntheses were lost in Pneumocystis 
spp. (Hauser et al. 2010; Cisse et al. 2012), a hallmark of 
obligate parasitism. Such loss was not observed in plant bio- 
trophs so far (Spanu 201 2), but occurs systematically in organ- 
isms feeding on other animals, such as protozoan parasites of 
humans, as well as, for example, in Homo sapiens (Payne and 
Loomis 2006). In fungal biotrophs, losses of biosynthetic path- 
ways are generally compensated by expanded transporters 
families to scavenge the products of these pathways (Spanu 
et al. 2010; Duplessis et al. 201 1). However, we observed a 
reduced panel of transporters susceptible to carry amino acids 
in Pneumocystis spp. This reduction may be linked to the loss 
of the amino acids biosynthesis pathways and suggests the 
use of an alternative strategy. Biochemical experiments 
showed that amino acids uptake in P. carinii may occur by a 
facilitated diffusion across the membrane (Basselin, Lipscomb, 
etal. 2001; Basselin, Qiu, 2001). Pneumocystis spp. have been 
previously suggested to scavenge other crucial compounds 
from their hosts, that is, sterols (Joffrion and Cushion 2010). 

Nitrogen and sulfur assimilation pathways as well as the 
purine catabolism were lost in Pneumocystis spp. In the ab- 
sence of assimilation from inorganic sources, nitrogen and 
sulfur are essential compounds which can be obtained from 



Genome Biol. Evol. 6(8): 1938-1 948. doi:10.1093/gbe/evu1 55 Advance Access publication July 24, 2014 



1945 



Cisse et al. 



GBE 



the degradation of purines, or from proteolysis. Purine catab- 
olism is unlikely to be a source in P. jirovecii and P. carinii 
because we did not detect the key genes of this pathway. In 
the absence of the sulfite oxidase, which is required to me- 
tabolize cysteine and methionine, proteolysis does not seem 
to be the source of sulfur, source which thus remains to be 
characterized. On the other hand, proteolysis could be the 
source of nitrogen which would be consistent with the fact 
that animal tissues are rich in proteins. This would contrast 
from the saprotroph S. pombe, which uses the purine degra- 
dation for this purpose (Kinghorn and Fluri 1984). The loss of 
the purine degradation has not been observed in fungal bio- 
trophs so far and thus represents a specificity of Pneumocystis 
spp., which might also be related to their adaptation to animal 
hosts. On the other hand, the loss of inorganic nitrogen and 
sulfur assimilation is frequently observed in fungal plant bio- 
trophs (Spanu 201 2), but this is not a hallmark of biotrophy as 
it is also observed in Schizosaccharomyces spp. that are sapro- 
trophs. An interesting alternative to the possibility of acquiring 
essential compounds from the host is the Black Queen hy- 
pothesis (Morris et al. 2012). This hypothesis proposes that 
the loss of capability at the individual level can be compen- 
sated by the community. In the case of Pneumocystis spp., this 
community would be the other microorganisms sharing the 
same environment. This could be investigated through studies 
of the metabolic capacities of the lung microbiota associated 
with Pneumocystis pneumonia. 

The key enzymes for thiamine biosynthesis were not 
detected in Pneumocystis spp. This loss is also a hallmark of 
biotrophy (Spanu 201 2) and suggests that this vitamin is avail- 
able from the mammalian host lungs. The missing thiamine 
thiazole synthase is a turnover suicide enzyme that needs to 
be replaced after each reaction (Chatterjee et al. 201 1). 

Our searches revealed that the RNAi machinery is lost in 
Pneumocystis spp. This loss would have occurred after the 
divergence of the Taphrina and Pneumocystis genera but 
before diversification in the Pneumocystis genus. The removal 
of RNAi has been suggested to confer a selective advantage to 
pathogens because it would enhance genome evolution by 
allowing the free movement of transposons and retrotranspo- 
sons (Oliver and Greene 2009). The RNAi machinery is also 
absent in the fungi 5. cerevisiae, U. maydis, and Cryptococcus 
gattii, as well as in the protozoan parasites Trypanosoma cruzi 
and Plasmodium falciparum (Nicolas et al. 2013). The signifi- 
cance of this loss remains to be elucidated. 

Our findings suggest that obligate biotrophy arose in 
Pneumocystis spp. in conjunction with gene losses of multiple 
pathways. The vital compounds that can be scavenged from 
the host lungs probably allowed these losses. Together with 
other characteristics, that is, a sex life cycle within the host, the 
absence of massive destruction of host cells, the absence of an 
axenic in vitro culture system so far, and the lack of virulence 
factors, these gene losses strongly suggest that Pneumocystis 
spp. should be considered as obligate biotrophs. The 



reduction of the genome size of Pneumocystis spp. contrasts 
with the increase of this feature often observed in fungal plant 
obligate biotrophs (Spanu 201 2). This increase corresponds to 
a proliferation of retrotransposons, which would create ge- 
netic variability and diversity including panels of effectors. As 
in oomycetes, the genome size reduction in Pneumocystis spp. 
might be compensated by the genetic diversity generated by 
sexuality, suggesting that they might be "obligate sexual or- 
ganisms" (Spanu 2012). Another hypothesis is that the niches 
in animal hosts do not require a variety of effectors to sustain 
infection but rather an increased fitness. This increased fitness 
might be obtained by the reduction of genome size, a feature 
which is observed in most human parasites, eukaryotic as well 
as prokaryotic (Sakharkar 2004; Hauser et al. 2010). 

In contrast to necrotrophy, biotrophy implies a long-lasting 
and tight relationship with the living host and its immune 
system. Such relationship requires mechanisms which are ex- 
pected to be specific to each host. One mechanism might be 
the antigenic variation encoded by the contingency gene fam- 
ilies present at the telomeres of Pneumocystis spp., which may 
help maintaining the infection despite the host immune 
system. Thus, one can speculate that the strict host specificity 
and the biotrophy of Pneumocystis spp. were acquired simul- 
taneously. The obligate nature of biotrophy may have 
emerged later and implied coevolution with the host. 

A limitation of our study was the use of the Dollo parsi- 
mony model, which assumes that lost ancestral traits cannot 
be reacquired. Indeed, the model is not appropriate to detect 
genes brought by horizontal gene transfer events, which are 
thought to be relatively frequent in lower eukaryotes such as 
fungi (Fitzpatrick 201 2). The possible occurrence of such trans- 
fers, which may have brought important new traits within the 
Pneumocystis genus, deserves further studies. 

In conclusion, Pneumocystis spp. harbor a unique combi- 
nation of characteristics which are hallmarks of both obligate 
biotrophs and animal parasites. The evolutionary events un- 
derlying biotrophy in Pneumocystis spp and the closely related 
hemibiotroph T. deformans appear drastically different, which 
suggests a parallel rather than a convergent evolution. 
Pneumocystis spp. should be considered as the first described 
obligate biotrophs of animals, which can turn into opportu- 
nistic biotrophic pathogens upon immune deficiency of the 
host. The lifestyle of P. jirovecii differs from that of other 
human fungal parasites. Indeed, Malassezia and Candida 
spp. can be considered as obligate commensals and opportu- 
nistic necrotrophs, whereas Dermatophytes would include 
both facultative and obligate true necrotrophs, which, possi- 
bly, might be transiently commensals. 

Supplementary Material 

Supplementary material, tables S1-S7, and figures S1-S3 are 
available at Genome Biology and Evolution online (http:// 
www.gbe.oxfordjournals.org/). 
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